Estimating Tennis In-Match-Win Probability with Bayesian Modeling

Ben Moolman

St. Lawrence University

2024-04-26

Introduction

Bayesian Prior, Data, and Posterior

Johnson, A. A., Ott, M. Q., & Dogucu, M. (2021). Bayes Rules! An Introduction to Applied Bayesian Modeling

Prior Distribution

Data + Posterior

Data + Posterior

Prior Distribution: Paired Competition Model

  • define \(Y_{ijk}\) be a Bernoulli random variable equal to
    • \(1\) if player \(i\) wins the \(k^{th}\) point against player \(j\).
    • \(0\) if player \(i\) loses the \(k^{th}\) point against player \(j\).
  • \(\text{E}(Y_{ijk}) \equiv \pi_{ijk}\), the probability that Player \(i\) wins the \(k^{th}\) point against Player \(j\).

Prior Distribution: Paired Competition Model

\[ \text{logit}(\pi_{ijk}) = \beta_{alcaraz}X_{alcaraz} + \beta_{sinner}X_{sinner} + \ldots + \beta_{ruud}X_{ruud}, \]

  • \(X_{alcaraz}\) is equal to
    • \(1\) if Alcaraz is player \(i\) on the \(k^{th}\) point.
    • \(0\) if Alcaraz is neither player \(i\) nor player \(j\) on the \(k^{th}\) point.
    • \(-1\) if Alcaraz is player \(j\) on the \(k^{th}\) point.
  • \(\beta_{alcaraz}\) represents a unitless “ability” of Alcaraz.

Paired Competition Model: Example

  • log-odds of Carlos Alcaraz (player \(i\)) winning a point against Jannik Sinner (player \(j\)):

\[\begin{equation} \begin{aligned} \text{logit}(\pi_{ij}) & = \beta_{alcaraz}(1) + \beta_{sinner}(-1) + \ldots + \beta_{ruud}(0) \\ & = \beta_{alcaraz} - \beta_{sinner} \end{aligned} \end{equation}\]

Prior Distribution

Adding a Server Effect

\[\begin{equation} \begin{aligned} \text{logit}(\pi_{ijk}) = & \beta_{alcaraz}X_{alcaraz} + \beta_{sinner}X_{sinner} + \ldots + \beta_{ruud}X_{ruud} + \\ & \alpha_{alcaraz}X_{alcaraz,s} + \alpha_{sinner}X_{sinner,s} + \ldots + \alpha_{ruud}X_{ruud,s} \end{aligned} \end{equation}\]

  • \(X_{alcaraz,s}\) is equal to
    • \(1\) if Alcaraz is the serving player \(i\) on point \(k\).
    • \(0\) if Alcaraz is the returning player on point \(k\) or if Alcaraz is neither player \(i\) nor player \(j\).
    • \(-1\) if Alcaraz is the serving player \(j\) on point \(k\)
  • \(\alpha_{alcaraz}\) represents a bump in point win probability for when Alcaraz serves compared to when he receives.

Server Effect: Example

  • log-odds of Carlos Alcaraz (player \(i\)) winning a point against Jannik Sinner (player \(j\)) with Alcaraz serving on point \(k\):

\[\begin{equation} \begin{aligned} \text{logit}(\pi_{ijk}) & = \beta_{alcaraz}(1) + \beta_{sinner}(-1) + \ldots + \beta_{ruud}(0) + \\ & \;\;\;\; \alpha_{alcaraz}(1) + \alpha_{sinner}(0) + \ldots + \alpha_{ruud}(0) \\ & = \beta_{alcaraz} + \alpha_{alcaraz} - \beta_{sinner} \end{aligned} \end{equation}\]

Server Effect: Example

  • log-odds of Carlos Alcaraz (player \(i\)) winning a point against Jannik Sinner (player \(j\)) with Sinner serving on point \(k\):

\[\begin{equation} \begin{aligned} \text{logit}(\pi_{ijk}) & = \beta_{alcaraz}(1) + \beta_{sinner}(-1) + \ldots + \beta_{ruud}(0) + \\ & \;\;\;\; \alpha_{alcaraz}(0) + \alpha_{sinner}(-1) + \ldots + \alpha_{ruud}(0) \\ & = \beta_{alcaraz} - \beta_{sinner} - \alpha_{sinner} \end{aligned} \end{equation}\]

Prior Distribution

Calculating In-Match-Win Probability

Match-Win Probability vs Winning-Point Probability (on Serve)

We are looking at 3 different probabilities

  • \(p_{sinner}\) and \(p_{alcaraz}\) are probabilities of winning a point on serve and get updated throughout the match
  • Overall match-win probability is calculated using these probabilities

Case Study 1: Alcaraz vs Sinner

  • In the 2022 US Open, Carlos Alcaraz faced Jannik Sinner in the quarterfinals
  • Alcaraz defeated Sinner in 5 sets, 6-3, 6-7(7), 6-7(0), 7-5, 6-3
  • We will look at probability of Alcaraz winning the match
  • Let \(\tilde{p}\) be the posterior median probability of winning a point on serve:
    • \(\tilde{p}_{alcaraz} = 0.6229\)
    • \(\tilde{p}_{sinner} = 0.5606\)

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Case Study 1: Alcaraz vs Sinner

Posterior Distribution

Posterior Sampling

  • posterior distribution sampled 4000 times
    • each of the 4000 draws is plugged in to the match-win probability chain.
  • even if center of posterior distribution does not change, match-win probability will still be affected by the posterior distribution variability.

Case Study 1: Changing Prior

Case Study 2: Gauff vs Sabalenka

  • In the 2023 US Open, Coco Gauff faced Aryna Sabalenka in the finals
  • Gauff defeated Sabalenka in 3 sets, 2-6, 6-3, 6-2
  • We will look at probability of Gauff winning the match
  • Probability of winning a point on serve at the start of the match:
    • \(\tilde{p}_{gauff} : 0.5880\)
    • \(\tilde{p}_{sabalenka} : 0.5475\)

Case Study 2: Gauff vs Sabalenka

Case Study 2: Gauff vs Sabalenka

Case Study 2: Gauff vs Sabalenka

Case Study 2: Gauff vs Sabalenka

Case Study 2: Gauff vs Sabalenka

Case Study 2: Gauff vs Sabalenka

Conclusion

  • Dynamic Nature

  • Data-driven Insights

  • Future Directions

Acknowledgements

  • Jeff Sackman Github
  • Skoval deuce package
  • James Wolpe SLU ’23 prior distribution
  • Dr. Matt Higham